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Among the novel metrics used to study the relative importance of nodes in complex networks, fc-core decomposition 
has found a number of applications in areas as diverse as sociology, proteinomics, graph visualization, and distributed 
system analysis and design. This paper proposes new distributed algorithms for the computation of the fc-core decom- 
position of a network, with the purpose of (i) enabling the run-time computation of fc-cores in "live" distributed systems 
and (ii) allowing the decomposition, over a set of connected machines, of very large graphs, that cannot be hosted in 
a single machine. Lower bounds on the algorithms complexity are given, and an exhaustive experimental analysis on 
real-world graphs is provided. 
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1 Introduction 

In the last few years, a number of metrics and methods have been introduced for studying the relative "importance" of 
nodes within complex network structures. Examples include betweenness, eigenvector and closeness centrality indexes ||5] 
[TOl . Such studies have been applied in a variety of settings, including real networks like the Internet topology, social 
networks like co-authorships graphs, protein networks in bio-informatics, and so on. 

Among these metrics, A;-core decomposition is a well- 
established method for identifying particular subsets of the 
graph called k-cores, or k-shells [12J. Informally, a fc-core 
is obtained by recursively removing all nodes of degree 
smaller than k, until the degree of all remaining vertices is 
larger than or equal to k. Nodes are said to have coreness 
k (or, equivalently, to belong to the /c-shell) if they belong 
to the fc-core but not to the (fc + l)-core. As an exam- 
ple of fc-core decomposition for a sample graph, consider 
Figure [T] Note that by definition cores are "concentric", 
meaning that nodes belonging to the 3-core belong to the 
2-core and 1-core, as well. Larger values of "coreness", 
though, clearly correspond to nodes with a more central 
position in the network structure. 

fc-core decomposition has found a number of applica- 
tions; for example, it has been used to characterize social networks |[T2l . to help in the visuaUzation of complex graphs lUl, 
to determine the role of proteins in complex proteinomic networks JH, and finally to identify nodes with good "spreading" 
properties in epidemiological studies |[8l. 

Centralized algorithms for the fc-core decomposition already exist |3]. Here, we consider the distributed version of 
this problem, which is motivated by the following scenarios: 

• One-to-one scenario: The graph to be analyzed could be a "live" distributed system, such as a P2P overlay, that 
needs to inspect itself; one host is also one node in the graph, and connections among hosts are the edges. Given 




Figure 1 : fc-core decomposition for a sample graph. 
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that cores with larger k are known to be good spreaders (H, this information could be used at run-time to optimize 
the diffusion of messages in epidemic protocols ifTTl . 

• One-to-many scenario: The graph could be so large to not fit into a single host, due to memory restrictions; or its 
description could be inherently distributed over a collection of hosts, making it inconvenient to move each portion 
to a central site. So, one host stores many nodes and their edges. As an example, consider the Facebook social 
graph, with 500 million users (nodes) and more than 65 billion friend connections (edges) in December 2010; or 
the web crawls of Google and Yahoo, which stopped to announce the size of their indexes in 2005, when they both 
surpassed the 10 billion pages (nodes) milestone. 

Interesting enough, the two scenarios turn out to be related: the former can be seen as a special case of the "inherent 
distribution" of the latter taken to its extreme consequences, with each host storing only one node and its edges. 

The contribution of this paper is a novel algorithm that could be adapted to both scenarios. Reversing the above 
reasoning. Section [3] first proposes a version that can be applied to the one-to-one scenario, and then shows how to 
migrate it to the one-to-many scenario, by efficiently putting a collection of nodes under the responsibility of a single 
host. We prove that the resulting algorithm completes the A;-core decomposition in 0{N) rounds, with N being the 
number of nodes; more precisely. Section |4] shows an upper bound equal to N — K -\- 1, with K being the number of 
nodes with minimal degree, and describes a worst-case graph that requires exactly such number of rounds. While such 
upper bounds are rather high, real world graphs such as the Slashdot comment network, the citation graph of Arxiv or the 
Gnutella overlay network require a surprisingly low number of rounds, as demonstrated in the experiments described in 
Section |5] 

2 Notation and system model 

Given an undirected graph G = {V,E) with N = \V\ nodes and M = \E\ edges, we define the concept of k-core 
decomposition: 

Definition 1. A subgraph G{C) induced by the set C is a k-core if and only if\/u £ C : dQ(^c){'^) — '^^^ G{C) 
is the maximum subgraph with this property. 

Definition 2. A node in G is said to have coreness k if and only if it belongs to the k-core but not the {k + l)-core. 

Here, dciu) and kciu) denote the degree and the coreness of u in G, respectively; in what follows, G can be dropped 
when it is clear from the context. G{G) = (C, E\G) is the subgraph of G induced by C, where E\G = {{u,v) € E : 
u,v e G}. 

The distributed system is composed by a collection of hosts H, whose overall goal is to compute the /c-core decom- 
position of G. Each node u is associated to exactly one host h{u) E H, that is responsible for computing the coreness of 
u. Each host x is thus responsible for a collection of nodes V{x), defined as follows: 

V{x) = {u : h{u) = x}. 

Each host x has access to two functions, neighhory{) and neighhor^{), that return a set of neighbor nodes and 
neighbor hosts, respectively. Host x may apply these functions to either itself or to the nodes under its responsibility; it 
cannot obtain information about neighbors of other hosts or nodes under the responsibility of other nodes. Formally, the 
functions are defined as follows: 

Vn G y : neighhorY{u) = {v : {u,v) G E} 

\/x £ H : neighbory{x) = {v : {u,v) £ E Au e V{x)} 

yx £ H : neighbor^{x) = {y : {u,v) £ E A u £ V{x) Ave V{y)} 

A special case occurs when the graph to be analyzed coincides with the distributed system, i.e. H = V. When this 
happens, the label u will be used to denote both the node and the host, and in general we will use the terms node and host 
interchangeably. Also, note that in this case neighborY{u) = neighbor fj{u). 

Hosts communicate through reliable channels. For the duration of the computation, we assume that hosts do not crash. 
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3 Algorithm 



Our distributed algorithm is based on the property of locality of the A;-core decomposition: due to the maximality of cores, 
the coreness of node u is the largest value k such that u has at least k neighbors that belong to a A;-core or a larger core. 
More formally, 

Theorem 1 (Locality). For each u ^V, k{u) = k if and only if 

(i) there exist a subset C neighbory{u) such that \ Vk\ = k and\/v G Vk '■ k{v) > k; 

(ii) there is no subset Vfc+i C neighbory{u) such that | V^+il = A; + 1 and\/v G Vk+i ■ k{v) > A; + 1. 

Proof 

=^) Since k{u) = k there exists a (maximal) set ^ V such that u E and G{Wk) is a fc-core, and there is no 
set Wk+i ^ V such that u G Wk+i and G{Wk+i) is a (A: + l)-core. Indeed, G{Wk) is a fc-core for all nodes in 
Wk, and so it is for at least k neighbors of u, because of maximality. Part (ii) follows by contradiction: assume 
that . . . , Vk+i are A; + 1 neighbors of u that have coreness A: + 1 or more. Denote Wi, . . . , Wk+i the subsets of 
nodes inducing their corresponding (A; + l)-cores. Consider the set U = {u} U [J^^i Wi, that merges u with all 
the Wi sets. The subgraph G{U) induced by U contains at least A; + 1 nodes (because each Wi contains at least 
A; + 1 nodes); for each node f G [7, (v) > k + 1, because U is the union of A; + 1 (A; + l)-cores and the k + 1 
neighbors of u are included in it. But this proves that a (A; + l)-core exists {G{U) may well not be maximal) and u 
belongs to it. Contradiction. 

<J=) For each node Vi E Vk, I < i < k, k{vi) = k implies the existence of a set Wi <^V such that G{Wi) is a A:-core of 
G and Vi S Wi. Consider the set U = {u} U Ui=i Wi. The subgraph G{U) induced by U contains at least k + 1 
nodes (because each Wi contains at least A; + 1 nodes); for each node v ^ U, dQ(ij^{v) > A; + 1, because it is the 
union of k A;-cores and the k neighbors of u are included in U. Thus, because of maximality, G{U) is a A;-core of 
G containing u. Suppose now that there exists a subset W C V such that G{W) is a (A; + l)-core containing u; 
this means that u has at least k + 1 neighbors, each of them with coreness A; + 1 or more; but this contradicts our 
hypothesis (ii). We can conclude that k{v) = k. □ 

The locality property tells us that the information about the coreness of the neighbors of a node is sufficient to compute 
its own coreness. Based on this idea, our algorithm works as follows: each node produces an estimate of its own coreness 
and communicates it to its neighbors; at the same time, it receives estimates from its neighbors and use them to recompute 
its own estimate; in the case of a change, the new value is sent to the neighbors and the process goes on until convergence. 

This outline must be formalized in a real algorithm; we do it twice, for both the one-to-one and the one-to-many 
scenarios. We conclude the section with a few ideas about termination detection, that are valid for both versions. 

3.1 One host, one node 

Each node u maintains the following variables: 

• core is an integer that represents the local estimate of the coreness of u; it is initialized with the local degree. 

• est[] is an integer array containing one element for each neighbor; est[v] represents the most up-to-date estimate of 
the coreness of v known by u. In the absence of more precise information, all its entries are initialized to +oo. 

• changed is a Boolean flag set to true if core has been recently modified; initially set to false. 

The protocol is described in Algorithm [T] Each node u starts by broadcasting a message {u,d{u)) containing its 
identifier and degree to all its neighbors. Whenever u receives a message {v, k) such that k < est[v], the entry est[v] 
is updated with the new value. A new temporary estimate t is computed by function computelndex() in Algorithm [2] 
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If t is smaller than the previously known value of core, core is modified and the changed flag is set to true. Function 
computelndexO returns the largest value i such that there are at least i entries equal or larger than i in est, computed as 
follows: the first three loops compute how many nodes have estimate i or more, 1 < i < A;, and store this value in array 
count. The while loop searches the largest value i such that count[i] > i, starting from k and going down to 1. 

The protocol execution is divided in periodic rounds: every S time units, variable changed is checked; if the local 
estimate has been modified, the new value is sent to all the neighbors and changed is set back to false. This periodic 
behavior is used to avoid flooding the system with a flow of estimate messages that are immediately superseded by the 
following ones. 

It is worth remarking that during the execution, variable core at node u (i) is always larger or equal than the real 
coreness value of u, and (ii) cannot increase upon the receipt of an update message. Informally, these two observations 
are the basis of the correctness proof contained in Section|4j 

3.1.1 Example 

We describe here a run of the algorithm on the simple sample graph reported in Fig. [2] At the first round, all nodes v 
have core = d{v); nodes 2, 3, 4 and 5 send the same value core = 3 with their neighbors: these messages do not cause 
any change in the estimates of the coreness of receiving nodes. However, in the same round, nodes 1 and 6 notify their 
core = 1 value to nodes 2 and 5, respectively: as a consequence, node 2 and 5 update their estimates to core = 2. Thus, 
in the second round another message exchange occurs, since nodes 2 and 5 notify their neighbors that their local estimate 
changed, i.e., they send core = 2 to nodes 1, 3, 4 and 3, 4, 6, respectively. This causes an update core = 2 at nodes 3 and 
4, which have to send out another update core = 2 to nodes 2 and 4 and nodes 3 and 5, respectively, in the third round. 
However, no local estimate changes from now on, which in turns means that the algorithm converged. Finally, core = 2 
for f = 2, 3, 4, 5 and core = 1 for = 1, 6. 
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Algorithm 2: int computelndex(iiit[] est, int u, k) 



Algorithm 1: Distributed algorithm to compute the 
/c-core decomposition, executed by node u. 

on initialization do 
core ^ d{u)\ 

foreach v G neighhory{u) do est[v] ^ oo; 
send (u, core) to neighbor^ (n) ; 

on receive {v, k) do 
if A; < est[v] then 
est[f] ^ k; 

t ^ computelndex(est, u, core); 
i(t < core then 

core ^ t; 

changed ^ true; 

repeat every 6 time units (round duration) 
if changed then 

send {u, core) to neighbory (n); 
changed ^ false; 



for i = 1 to A; do count[i] ^ 0; 
foreach v G neighhory{u) do 

j min(A;, est[w]); 

count[j] = count\j] + 1; 

for i = k downto 2 do 

1^ count[i — 1] count[i — 1] + count[i]; 

i k\ 

while i > 1 and count[i] < i do 
|_ i ^ i - 1; 

return i; 




Figure 2: A simple example describes the run of the algo- 
rithm. 



3.1.2 Optimization 

Depending on the communication medium available, some optimizations are possible. For example, if a broadcast 
medium is used (like in a wireless network) and the neighbors are all in the broadcast range, the send to primitive can be 
actually implemented through a broadcast. If the send to primitive is implemented through point-to-point send operations, 
a simple optimization is the following: message updates (n, core) are sent to a node v if and only if core < est[v]; in 
other words, it is sent only if a node u knows that the new local estimate core has the potential of having an effect on the 
coreness of u; otherwise, it is skipped. In our experiment, described in Section |5] this optimization has shown to be able 
to reduce the number of exchanged messages by approximately 50%. 

3.2 One host, multiple nodes 

The algorithm described in the previous section can be easily generalized to the case where a host x is responsible for a 
collection of nodes V{x): x runs the algorithm on behalf of its nodes, storing the estimates for all of them and sending 
messages to the hosts that are responsible for their neighbors. Described in this way, the new version of the algorithm 
looks trivial; an interesting optimization is possible, though. Whenever a host receives a message for a node u G V{x), it 
"internally emulates" the protocol: the estimates received from outside can generate new estimates for some of the nodes 
in V{x); in turn, these can generate other estimates, again in V{x); and so on, until no new internal estimate is generated 
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and the nodes in V{x) become quiescent. At that point, all the new estimates that have been produced by this process are 
sent to the neighboring hosts, where they can ignite these cascading changes all over again. 
Each node x maintains the following variables: 

• est[] is an integer array containing one element for each node in V{x) U neighbory{x); est[v] represents the most 
up-to-date estimate of the coreness of v known by x. Given that elements of neighborY{x) could belong to V{x) 
(i.e. some of the neighbors nodes of nodes in V{x) could be under the responsibility of x), we store all their 
estimates in est[] instead of having a separate array core[] for just the nodes in V{x). 

• changed[] is a Boolean array containing one element for each node in V{x); changed[v] is true if and only if the 
estimate of v has changed since the last broadcast. 

The protocol is described in Algorithm [5] At the beginning, all nodes v € V{X) are initialized to est[v] = d{v); in 
the absence of more precise information, all other entries are initialized to +00. Function improveEstimate() is run to 
compute the best estimates x can obtain with the local information; then, all the current estimates for the nodes in V{x) 
are sent to all nodes. 

Whenever a message is received, the array estis updated based on the content of the message; function improveEstimate() 
is called to take into account the new information that x may have received. 

Periodically, node x computes the set S of all pairs {v, est[v]) such that (i) x is responsible for v and (ii) est[v] has 
changed since the last broadcast. If S is not empty, it is sent to all nodes in the system. 

Function improveEstimate() (Algorithm |4]l performs the local emulation of our algorithm. In the body of the while 
loop, x tries and improve the estimates by calling computelndex() on each of the nodes it is responsible for. If any of 
the estimates is changed, variable again is set to true and the loop is executed another time, because a variation in the 
estimate of some node may lead to changes in the estimate of other nodes. 

3.2.1 Communication policy 

There are two policies for disseminating the estimate updates. The above version of the algorithm assumes that a broadcast 
medium is available. This means that a single message containing all the updates received since the last round could be 
created and sent to all. 

Alternatively, we could adopt a communication system based on point-to-point send operations. In this case, it does 
not make sense to send all updates to all nodes, because each update is interesting only for a subset of nodes. So, for each 
host y G if, we create a message containing only those updates that could be interesting for y. The modification to be 
applied to Algorithm [3] are contained in Algorithm [5] 

3.2.2 Node-hosts assignment policy 

The graph to be analyzed could be "naturally" split among the different hosts, or nodes could be assigned to hosts based 
on a well-defined policy. It is difficult to identify efficient heuristics to perform the assignment in the general case. In 
this paper, we adopt a very simple policy: assuming that nodes identifiers ai^e integers in the range [0 ... n — 1] and hosts 
identifiers are integers in the range [0 . . . \H\ — 1], each node u is assigned to host (u mod \H\). 

3.3 Termination 

To complete both algorithms, we need to discuss a mechanism to detect when convergence to the correct coreness values 
has been reached. There are plentiful of alternatives: 

• Centralized approach: each host may inform a centralized server whenever no new estimate is generated during a 
round; when all hosts are in this state, messages stop flowing and the protocol can be terminated. This is particularly 
suited for the "one node, multiple hosts" scenario, where it corresponds to a master-slaves approach. 
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• Decentralized approach: epidemic protocols for aggregation ||6l enable the decentralized computation of global 
properties in 0(log \ H\) rounds. These protocols could be used to compute the last round in which any of the hosts 
has generated a new estimate (namely, the execution time): when this value has not been updated for a while, hosts 
may detect the termination of the protocol and start using the computed coreness. 

• Fixed number of rounds: as shown in Section|5] most of real-world graphs can be completed in a very small number 
of rounds (few tens); furthermore, after very few rounds the estimate error is extremely low. If an approximate 
fe-core decomposition could be sufficient, running the protocol for a fixed number of rounds is an option. 



Algorithm 3: Distributed algorithm to compute the 
/c-core decomposition, executed by host x. 



on initialization do 

foreach v G neighhory{x) do est[v] i 
foreach u G V{x) do est[u] <— d{u); 
improveEstimate(est); 

S ^ {{u, est[u\) : u G V{x)}; 
send (5) to neighbor jj{x); 



+00; 



on receive (S) do 

foreach {v, k) € S do 
1^ if A; < est[v] then est[v] 

improveEstimate(est); 



repeat every 6 time units ( round duration ) 

foreach n G ^(x) do 
if changed[u] then 

S ^ SU{{u,est[u])}; 
changed[u] ^ false; 

if5/0then 
|_ send (S) to neighbor jj{x); 



Algorithm 4: improveEstimate(int[] est) 

again ^ true; 
while again do 
again •(— false; 
foreach u eV{x) do 

k ^ computelndex(est, u, est['u]); 
if k < est[u\ then 
est[u\ ^ k; 
changed[u\ ^ true; 
again ^ true; 



Algorithm 5: Code to be substituted in Algorithm 3] 



repeat every 6 time units ( round duration ) 
foreach y G neighbor ^{x) do 

S ^ {{u, est[u]) : u G V{x) A 

{u,v) ^ E ^v & V{y) }; 
if 5 / then 
|_ send {S) to y; 

foreach u G V{x) do 

1^ changed[u] <r- false; 



4 Correctness proofs 

We now prove that our algorithms are correct and eventually terminate. While we focus on the one-to-one scenario, the 
results can be easily extended to the one-to-many case. 
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4.1 Safety and liveness 

Theorem 2 (Safety). During the execution, variable core at each node u is always larger or equal than k{u). 

Proof. By contradiction, suppose there exists a node ui such that core{ui) < k{ui). By Theorem [T| there is a set Vi 
such that |Vi| = k{u) and for each v ^ Vi : k{v) > k{u). In order to set core{ui) smaller than k{u), ui must have 
received a message containing an estimate smaller than k{u) from at least one of the nodes in Vi. Formally, ui received a 
message (u2, core{u2)) from U2 at time t2, such that U2 € Vi and core{u2) < k{ui). Given that k{ui) < k{u2) (because 
U2 G Vi), we conclude that core{u2) < k{u2)- in other words, we found another node whose estimate is smaller than its 
coreness. By applying Theorem [T] again, we derive that U2 received a message {u2, core(u3)) from U3 at time < t2, 
such that core{u3) < k{u2) < k{u-i). This reasoning leads to an infinite sequence of nodes ui,U2,us, . . . such that 
core{ui) < k{ui) and Ui received a message from Uj+i at time ti, with ti > tj+i. Given the finite number of nodes, this 
sequence contains a cycle Ui, Uj+i, . . . ,Uj = Ui, but this means ti > tj+i > . . . > tj = U, a contradiction. □ 

Theorem 3 (Liveness). There is a time after which the variable core at each node u is always equal to k{u). 

Proof. By Theorem [2] variable core{u) cannot be smaller than k{u); by construction, variable core cannot grow. So, if we 
prove that the estimate will eventually become equal to the actual coreness, we have proven the theorem. The proof is by 
induction on the coreness k{u). 

• k{u) = 0; in this case, u is isolated. Its degree, used to initialize core, is equal to its coreness and the protocol 
terminates at the very beginning. 

• k{u) = 1; by contradiction, assume that core{u) > 1 never converges to k{u). This means that there is at least one 
node ui with coreness k{ui) = 1, neighbor of u = uq, that will never send a message {ui, 1) to uq because its 
variable core never reaches 1 {core{ui) > 1). Reasoning in the same way, we can find another node U2, different 
from Uq, such that {ui,U2) S E, k{u2) = 1 and core(ti2) > 1 forever. Going on in this way, we can build an 
infinite sequence of nodes uq,ui,U2, ■ ■ ■ connected to each other, all of them having k{ui) = 1 and core{ui) > 1 
and such that Ui 7^ Ui-2 for i > 2. Given the finite number of nodes, there is at least one cycle with three nodes or 
more in this sequence; but all nodes belonging to such a cycle would have coreness at least 2, a contradiction. 

• Induction step: by contradiction, suppose there is a node ui such that k{ui) = k > I and core{ui) > k forever. 
By Theorem [T| there are f > k neighbors of u with coreness greater or equal than k, and d{u) — f neighbors 
of u whose coreness is smaller than k. If / = k, ui will eventually receive d{ui) — k estimates smaller than k 
(by induction), while the other k estimates will always be larger or equal to k (by Theorem [2]). So, ui eventually 
sets core{ui) equal to k, a contradiction. If / > A: + 1, there is at least one node U2 among those / such that 
k{u2) = k (otherwise, having f > k + I neighbors with coreness A; + 1 or more, k{ui) would be A; + 1 or more, 
a contradiction) and core{u2) > k forever (otherwise, ui would have received / > A; + 1 updates equal to k or 
more, setting core{ui) = k, a. contradiction). Note that the remaining f — I > k neighbors of ui have coreness 
A; + 1 or more. By reasoning similarly as above, we can build an infinite sequence of nodes ui, n2, us, . . . such that 
k{ui) = k, d{ui) > k and core{ui) > k, with Uj is waiting a message from Uj+i to lower core{ui) to k. As above, 
the finite number of nodes implies that the sequence contains at least one cycle C = {ut, Wi+i, nj+2, ■ ■ ■ ,Uj = Ui. 
Now, for each of the nodes ui G C, consider k neighbors v\ . . . of Ui such that k{vl) > k. Let be a (A; + 1)- 
core containing vl (such cores exist because their coreness is larger than k). Consider now the set U defined as the 
union of all nodes Ui £ C and all (A; + l)-cores defined above: 

u = cu u vi 

Ui£C,l<j<k 

Consider the subgraph G{U) induced by U; in such graph, all nodes have at least A; + 1 neighbors, because the 
nodes in the cores have at least A; + 1 neighbors and each the nodes in C have k neighbors plus a distinct node 
that follows in the cycle (by construction). Thus, G{U) is a (A; + l)-core containing C, contradicting the assumption 
that the nodes in C have coreness k. □ 
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4.2 Time complexity 



We proved that our algorithms eventually converge to the correct coreness; we now discuss upper bounds on the execution 
time, defined as the total number of rounds during which at least one node broadcasts its new estimate (when no new 
estimates are produced, the algorithm stops and the correct values have been obtained). 

For this purpose, we assume that rounds are synchronous; during one round, each node receives all messages ad- 
dressed to it in the previous round (if any), computes a new coreness estimate and broadcasts a message to all its neigh- 
bors if the estimate has changed with respect to the previous round. At round 1, each node broadcasts its current estimate 
(equal to its degree) to all its neighbors. To simplify the analysis, no further optimizations are applied. In the final round, 
messages are sent but they do not cause any variation in the estimates, so the protocol terminates. 

The first observation is that after the first round, in any subsequent round before the final one at least one node must 
change its own estimate, reducing it by at least 1. This brings to the following theorem: 

Theorem 4. Given a graph G = {V, E), the execution time is bounded by 1 + [d[u) — k{u)\. 

Proof. The quantity [d{u) — k{u)\ represents the "initial error" at node u, i.e. the difference between the initial estimate 
(the degree) and the actual coreness of u. In the worst case, at most one message is broadcast per round, and each 
broadcast reduces the error by one unit, apart from the last one which has no effect. Thus the execution time is bounded 
by the sum of all initial errors plus one. □ 
While the previous bound is based on the knowledge of the actual coreness index of nodes, we can define a bound on 
the execution time that depends only on the graph size: 

Theorem 5. The execution time is not larger than N. 

Proof. Given a run of the algorithm, denote A{r) = {u ^ V : core{u) = k{u) at round r }. We make the following 
observations: 

i) A{\) 7^ 0: each node u with minimal degree 5 is included in ^(1). In fact, u is such that k{u) = 5, otherwise there 
would be a node v G neighbory{u) with a degree less than 6, which is impossible because 6 is minimal. Given that 
core{u) = 6 at round 1 by initialization, u belongs to A{1). 

ii) If n G A{r), then u does not send any message for all remaining rounds r + 2, r + 3, . . .. 

iii) A{r) C A{r + 1) Vr. 

We denote by T the smallest round index at which A{T) = V. By definition, the execution time equals T + l{^ 

Denote m{r) = mm{k{u) : u A{r)}, i.e., the minimal coreness of a node that did not yet attain the correct value 
at round r. Also, denote M(r) = {v : k{v) = m{r), v A{r)}, the set of all such nodes. 

Assume A{r) 7^ y so that M(r) 7^ 0: at round r + 1 there must exist v G M{r) such that v G A{r + 1), i.e., v attains 
the correct value at round r + 1. In fact, observe that at rounds r + 2, r + 3, . . ., only nodes in M{r) can exchange m{r) 
values due to ii). Thus, if no node in M (r) has attained the correct value at round r + 1, it means that all nodes in M(r) 
have at least m{r) + 1 neighbors whose estimates is larger than m{r) at round r + 1. However, nodes with k{v) < m{r) 
that belong to A{r) will never notify such value again. But, by definition of m(r), no lesser estimate will be broadcast. 
Hence, the correct estimate m(r) at such nodes will never be attained, contradicting Theorem[3] 

We hence proved that D{r) = A{r) \ A{r — 1) 7^ for r = 1, . . . , T, where we let ^(0) = for the sake of notation 
and ^(1) 7^ because of i). Also, it is easy to see that V = A{T) = VJ^^^D{r) and D{r) n D{s) = for r 7^ s. Thus, 

T T 

N = \[jD{T)\ = Y,\D{r)\>T 

r=l r=l 

'This is due to the fact that, by our definition, the execution time includes also the last round, in which updates are sent but they have no further 
effect on the computed coreness. 
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The tighter bound T < — 1 is obtained by contradiction. Consider round N — 2 and assume T > N — 1. Using 
the same arguments as above, |^(A^ ~ 2)| > N — 2. 

Case \ A{N — 2)| = — 1: the only remaining node v such that core{v) ^ k{v) would obtain the true coreness of its 
neighbors at round — 1, against our assumption. 

Case \A{N — 2)\ = N — 2: let us denote vi and V2 the pair of nodes such that core{vi) / k{vi) at round N — 2. 
It is easy to see that such nodes must be neighbors, otherwise all their neighbors would have the correct core value and 
they would receive those estimates and computed the correct value by round — 1. Also, they both have all remaining 
neighbors in the set A{N — 2), otherwise one of them would have degree 1, which is not possible since it would belong 
to A{1). However, core{vi) = k{vi) + 1 for i = 1, 2: in fact core{vi) > k{vi) at round A^ — 1 and exactly one neighbor 
has a wrong estimation. Also, core(t;i) > k^^ + 1 and core(t;2) > ky^ + 1. Thus, core{vi) = k^^ + 1 > k^^ + 1 and 
also core(t;2) = k^^ + 1 > k^-^^ + 1 so that core{vi) = core{v2)- However, nodes in A{N — 2) will not notify again 
their correct estimate from round A^ on and nodes vi and V2 will perform the same estimate they had at round A^ — 1, i.e., 
^1,2 + 1 = core{vi) = core(f 2) = k^^ + 1. Thus, no message can be exchanged from round A^ on, while core(wj) / k{vi) 
i = 1,2. But, this contradicts the liveness property so that it must be T < A^ — 1. □ 

From the proof, we observe that the nodes of minimal degree attain the correct coreness at the first round. We can 
slightly refine the bound as: 

Corollary 1. Let K be the number of nodes with minimal degree in G. Then the execution time on G is not larger than 
N — K + 1 rounds. 

It is worth remarking that the bound provided by Theorem [5] is tighter than that provided by Theorem [4] if and only if 

E d{u)-k{u) 

the initial average estimation error — is larger than 1 — ;^ ■ 

Some important questions are (i) how tight is the bound of Theorem|5} and (ii) is there any graph that actually requires 
A'^ rounds to complete? Experimental results with real-life graphs show that the bound is far from being tight (graphs with 
millions of nodes converge in less than one hundred rounds). However, we managed to identify a class of graphs close to 
the bound, i.e., with execution time equal to A^ — 1 rounds for A^ > 5. Assuming that nodes are numbered from 1 to A^, 
the rules to build such graphs are: 

• node A^ is connected to all nodes apart from node A^ — 3; 

• each node i = 1 . . . A^ — 2 is connected with its successor i + 1; 

• node A^ — 3 is also connected with node A^ — 1. 

Figure [3] shows the graph obtained by this scheme for A^ = 12. Graphically, it is convenient to represent node A^ 
as the hub of a polygon, where nodes are located at the corners. All nodes have degree 3, apart from the hub which has 
degree N — 2 and node 1 which has degree 2. When starting our algorithm, node 1 acts as a trigger: it has the smallest 
degree and its broadcast causes node 2 to change its estimate to 2, which in turn will cause node 3 to change its estimate 
to 2, and so on until the estimate of node A^ — 4 changes to 2. Note that node A^ has changed its estimate from A^ — 2 to 
3 after the first round, and has maintained this estimate so far. In the next next round, nodes A^ — 3 and A^ change their 
estimate to 2; in the last round, node A^ — 1 and N — 2 change their estimate to 2 as well and the algorithm terminates. 
Given that during each round apart from the last two, at most one node has changed its estimate, the total number of 
rounds is exactly A^ — 1 (A^ — 2 plus the last round). 

It is worth remarking that other simple structures one may think of as potential worst cases offer lower execution time. 
As an example, a linear chain of size A^ requires \N/ 2] rounds to converge. 

One would expect that there there should be a relation between diameter and execution time. The smaller the diameter, 
the shorter should be the execution time. However, despite we noticed a beneficial effect of small diameters, this does not 
hold in general: in fact, the example of Figure |3]provides a case when the convergence time increases linearly with A^ but 
the diameter is 3, i.e., a constant regardless of A^. 

4.3 Message complexity 

The maximum number of exchanged messages can be computed using a double counting argument: during the run of the 
algorithm, each node u can at most receive d{v) — k{v) updates from each neighbor v G neighbory{u). Then, there are 
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Figure 3: The worst-case graph, for which the execution time is exactly — 1 rounds, = 12. 



at most d{u) + d{v) — 2 messages that can be exchanged over link uv. If we sum over all the links 

2 • M < 2M (A - 1) (1) 



[d{u)+d{v)-2] 



veV{G) 



where A is the maximum degree in the graph. Overall, we obtain the following worst case bound: 
Corollary 2. Give a graph G = {V, E), the message complexity is bounded by '}2v(^v(G) '^^(^) 



2M. 



Looking at the left hand-side of Q we can see that the message complexity of the distributed A;-core computation is 
O(A-M). 

5 Experimental evaluation 

This section reports experimental results for both the one-to-one and the one-to-many versions of the algorithm, over a se- 
lection of graphs contained in the Stanford Large Network Dataset collectionj^ Undirected graphs have been transformed 
in directed graphs by considering both directions (i.e., two edges) for each link present in the original one. 

Simulations have been performed using Peersim [7J. Time is still measured in rounds, i.e. fixed-size time intervals 
during which each node has the opportunity to send one update message to all its neighbors. Unless otherwise stated, the 
results show the average over 50 experiments. Experiments differ in the (random) order with which operations performed 
at different nodes are considered in the simulation. 

5.1 One-to-one version 

For this version, the main results are summarized in Table [T] which is divided in two parts. On the left, the main features 
of each graph considered are reported: name, number of nodes, number of edges, diameter, maximum degree, to conclude 
with maximum and average coreness. 

On the right, the table reports information about the performance of the one-to-one protocol, based on two figures 
of merit: execution time (measured as the number of rounds in which at least one node sends an update message) and 
total number of messages exchanged. In particular, tavg, tmin and tmax represent the average, minimum and maximum 
execution time measured over 50 experiments, ruavg and rrimax represent the average and maximum number of messages 
per node. 

A few observations are in order. First of all, the execution time is of the order of few tens of rounds for most of 
the graphs, with only a couple of them requiring few hundreds of rounds (web-Berkstan, the web graph of Berkeley and 
Stanford, and RoadNet-TX, the road network of Texas). Compared with our theoretical upper bounds (number of nodes 
and total initial error), this suggests that our algorithm can be efficiently used in real-world settings. 



^http://snap. Stanford. edu/ data/ 
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Name 


\V\ 


\E\ 





dmax 


kmax 


kavg 


tavg 


tmin 


imax 




fTT'Tnax 


1) CA-AstroPh 


18 772 


198 110 


14 


504 


56 


12.62 


19.55 


18 


21 


47.21 


807.05 


2) CA-CondMat 


23 133 


93 497 


15 


280 


25 


4.90 


15.65 


14 


17 


13.97 


410.25 


3) p2p-Gnutella31 


62 590 


147 895 


11 


95 


6 


2.52 


27.45 


25 


30 


9.30 


131.25 


4) soc-sign-Slashdot090221 


82 145 


500485 


11 


2 553 


54 


6.22 


25.10 


24 


26 


29.32 


3 192.40 


5) soc-Slashdot0902 


82 173 


582 537 


12 


2 548 


56 


7.22 


21.15 


20 


22 


31.35 


3 319.95 


6) Amazon0601 


403 399 


2443 412 


21 


2 752 


10 


7.22 


55.65 


53 


59 


24.91 


2 900.30 


7) web-BerkStan 


685 235 


6649474 


669 


84230 


201 


11.11 


306.15 


294 


322 


29.04 


86293.20 


8) roadNet-TX 


1379 922 


1 921 664 


1049 


12 


3 


1.79 


98.60 


94 


103 


4.45 


19.30 


9) wiki-Talk 


2 394 390 


4 659 569 


9 


100029 


131 


1.96 


31.60 


30 


33 


5.89 


103 895.35 



Table 1: Results with the one-to-one algorithm. Name of the data set, number of nodes, number of edges, diameter, 
maximum degree, maximum coreness, average coreness, average-minimum-maximum number of cycles to complete, 
average/maximum number of messages sent per node. 



k 


# 


25 


50 


75 


100 


125 


150 


175 


200 


225 


250 


275 


300 


1 


55 776 


14.12% 


10.26% 


7.36% 


4.97% 


2.99% 


1.65% 


0.92% 


0.56% 


0.21% 


0.13% 


0.08% 


0.02% 


2 


83 109 


3.81% 


1.35% 


0.55% 


0.27% 


0.14% 


0.06% 














3 


67 910 


1.42% 


0.23% 






















4 


44 548 


0.95% 


0.07% 






















5 


68 728 


0.46% 


0.05% 






















6 


35 985 


3.48% 


1.01% 


0.01% 




















8 


32 412 


1.21% 


0.46% 


0.10% 




















9 


28 042 


0.18% 
























10 


22 322 


1.96% 


0.64% 






















15 


6 842 


0.99% 
























55 


2 548 


50.78% 


43.84% 


36.77% 


29.71% 


22.76% 


15.46% 


8.40% 


1.73% 











Table 2: Information about nodes that are delaying the completion of the protocol in the web-Berkstan graph. The first column k 
represents a coreness value; the second column # represents the size of the A:-core, i.e. the number of nodes whose coreness is k; the 
column labeled t — 25, 50, . . . , 300 represents the percentage of nodes in the given core that do not know the correct coreness value 
after t rounds. Empty cells corresponds to 0%. All other coreness are correctly computed at round 25. 

The average and maximum number of messages per node is, in general, comparable to the average and maximum 
degree of nodes. Clearly, nodes with several thousands neighbors will be more overloaded than others. 

In order to understand why web-Berkstan requires so many rounds to complete, we performed an in-depth analysis 
of the dynamic behaviour of the proposed algorithms. In particular, we considered, for each core, the time taken for all 
nodes within it to reach the correct coreness value. Results are reported in Table [2] The first two columns report the 
problematic cores and their cardinality, respectively. The remaining columns represent the percentage of nodes whose 
estimate is still erroneous at round t = 25, 50, . . . , 300; an empty column corresponds to 0%, i.e. the core computation 
has been completed. At first look, the 55-core seems particularly problematic, given that more than one half of it is still 
incorrect at round 25. But the 55-core completes before round 225, well before the 1-core that terminates after round 300. 
Delays in computing the 1-core may be associated to the high diameter of this particular graph, with "deep" pages very 
far away from the highest cores. 

Another figure of merit is the temporal evolution of error, measured as the difference - at each node - between the 
current estimate of the coreness and its correct value. The left part of Figure[4]shows the average error for our experimental 
graphs. When the line stops, it means that the algorithm has reached the correct coreness estimate, so the error is zero. 
The "subfigure" zooms over the first rounds, to provide a closer look to the test cases that converge quickly. The right part 
of Figure |4] shows the maximum error (computed over all nodes, and over 50 experiments) for all our graphs (points have 
been slightly translated to improve visualization). As it can be seen, in all our experimental data sets, the maximum error 
is at most equal to 1 by cycle 22. 
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Figure 4: Evolution of evaluation error over time. On the left, the average error over all nodes and all repetitions is shown. 
The smaller graph shows the details of the first rounds of the computation. The right part shows the maximum error over 
all nodes and all repetitions. 



These error figures tell us that if the exact computation of coreness is not required (for example if coreness is used 
to optimize gossip protocols in a social network), the /c-core decomposition algorithms proposed may be stopped after a 
predefined number of rounds, knowing that both the average and the maximum errors would be extremely low. 

5.2 One-to-many version 

The main reason for running the one-to-many version of the protocol is to compute the /c-core decomposition over large 
graphs, that cannot fit into the memory of a single machine. Experimental results showed that the number of rounds 
needed to complete the protocol was equivalent to that of the one-to-one version. One of the key performance figures 
to be considered for the one-to-many version is the communication overhead generated by update messages exchanged 
among hosts. The overhead is computed as the average number of times a node generates a new estimate that has to be 
sent to another host. 

Figure [5] shows the overhead per node with a variable number of hosts, with (left) and without (right) a medium 
broadcast available. For visualization reasons, only some of the original data sets have been considered; but the results are 
similar for all of them. Twenty experiments were considered for this case. In the graph, the outcome of each experiment 
was represented as a point (slightly translated for the sake of visualization clarity). 

When a broadcast medium is not available and point-to-point communication is used, the overhead increases with the 
number of hosts available, tending to stabilize to the levels of the one-to-one protocol (see the rriavg column of Table[T]- 



values are slightly higher given that the optimization of Section 3.1.2 cannot be applied in this case). When a broadcast 
medium is available, on the other hand, the efficiency is much higher. In this case, a single message is sent at each round, 
containing all the estimates that have changed since the previous one. Most of the nodes reach the correct estimate after 
few rounds and very few estimates are sent on their behalf after the first rounds; the effect is that the average number 
of estimates sent per node is extremely low, always smaller than 3, making the one-to-many algorithm particularly well- 
suited for clusters connected through fast local area networks. 



6 Conclusions 

To the best of our knowledge, this paper is the first to propose distributed algorithms for the /c-core decomposition of 
online and/or large graphs. While theoretical analysis provided us with fairly large upper bounds on the number of rounds 
required to complete the algorithm, which are strict for specific worst-case examples, experimental results have shown 
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Figure 5 : Overhead per node - with (left) and without (right) broadcast medium. 

that for realistic graphs, our algorithms efficiently converge in few rounds. 

The next logical step is the actual implementation of the algorithms. For this purpose, we are considering distributed 
frameworks like Hadoop and Pregel [9], in which the computation is divided in logical units (corresponding to the 
collection of nodes under the responsibility of a single host) and these units are divided among a collection of computa- 
tional processes, termed workers, in charge of processing them according to a set of defined rules. This would allow our 
solutions to inherit the desirable features of these frameworks in terms of efficiency, scalability and fault tolerance. 
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